Add auto-chunking and concurrent dispatch for bulk record operations by abelmilash-msft · Pull Request #162 · microsoft/PowerPlatform-DataverseClient-Python

abelmilash-msft · 2026-04-13T20:05:40Z

Summary

Add chunking with concurrency to address #156

records.create, records.update, and records.upsert now automatically split inputs exceeding 1,000 records into 1,000-record chunks — no caller intervention required.
New max_workers parameter (default 1, max 3) enables concurrent chunk dispatch via ThreadPoolExecutor. Values above the cap emit a UserWarning rather than raising.
Adds an optional prefetch_pages parameter (default 0, sequential) to _get_multiple: when set to 1, the next-page HTTP request is submitted immediately after receiving the current page — before yielding to the caller — so network I/O overlaps with per-page processing (e.g. transforms, DB writes); values above 1 are capped at 1.
Transient errors (429, 502, 503, 504) are retried per-chunk up to 3 times, sleeping for the Retry-After duration plus random jitter to desynchronise concurrent workers.
ContextVar values (e.g. correlation IDs) are propagated to worker threads via copy_context() — each future gets its own snapshot to avoid cross-thread context corruption.
Picklist label cache cold-start is serialised with double-checked locking so concurrent workers don't race to populate it.
dataframe.create and dataframe.update forward max_workers through to the record-level operations.

Load test results

Tested against a live Dataverse environment (25,000 records).

Concurrency comparison — 3,000 records (3 chunks of 1,000)

Operation	max_workers=1	max_workers=2	max_workers=3	Speedup
Create	91.74s	63.57s	32.44s	2.83×
Update	94.42s	61.05s	32.68s	2.89×
Upsert	115.66s	75.40s	41.54s	2.78×

High-volume load — 25,000 records (25 chunks of 1,000, max_workers=15)

Load tests were run with _MAX_WORKERS=15 to validate throughput at higher concurrency.

Operation	Time	Throughput
Create	86.02s	291 records/s
Update	86.95s	288 records/s
Upsert (create)	135.52s	184 items/s
Upsert (update)	515.13s	49 items/s

Atomicity note

Chunked operations are not atomic. If a chunk fails mid-way, earlier chunks are already committed. Callers that require atomicity should limit input to ≤ 1,000 records per call.

Unit tests

test_multiple_chunking.py is a new test file introduced by this PR — it did not exist in main.

	Count
New tests (test_multiple_chunking.py)	93
Total suite	1,246 passed

Coverage on changed public modules (full suite):

Module	Coverage
`operations/records.py`	100%
`operations/dataframe.py`	100%
Overall package	94%

Tests cover: auto-chunking boundaries (999/1000/1001 records), sequential and concurrent dispatch, chunk ordering, transient retry with jitter, ContextVar propagation to worker threads, max_workers cap warning, and exception
propagation.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…S to 3 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…3, fix minor docstrings Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Remove examples/advanced/contextvar_thread_demo.py (internal debugging artifact, not intended for the public repo). Trim redundant clause from the large-batch README tip. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Replace silent capping in _dispatch_chunks with an explicit UserWarning so callers are informed when their max_workers value is reduced. Revert _MAX_WORKERS to 3. Remove two internal implementation comments from _upsert_multiple. Update tests to assert the warning is emitted. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Replace 'silently capped' with 'capped to _MAX_WORKERS' throughout _odata.py to be consistent with the UserWarning now emitted on cap. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Copilot

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

Adds built-in chunking (1,000-record batches) and optional concurrent dispatch for bulk record operations, including retry-with-jitter on transient errors and thread ContextVar propagation.

Changes:

Introduces _dispatch_chunks to run chunked bulk requests sequentially or via ThreadPoolExecutor, with per-chunk transient retry logic and ContextVar propagation.
Extends records.create/update/upsert and dataframe.create/update APIs with max_workers forwarding (default 1) and updates docs/examples accordingly.
Adds/updates unit tests for chunking boundaries, concurrency ordering, retries, warnings/capping, and picklist cache cold-start locking.

Reviewed changes

Copilot reviewed 11 out of 11 changed files in this pull request and generated 5 comments.

Show a summary per file

File	Description
tests/unit/test_records_operations.py	Updates mocks/assertions to include `max_workers` forwarding.
tests/unit/test_dataframe_operations.py	Adds `max_workers` forwarding assertions and new forwarding-focused tests.
tests/unit/test_client.py	Updates client-level tests for new `max_workers` argument propagation.
tests/unit/data/test_multiple_chunking.py	Adds comprehensive chunking/concurrency/retry/cache-lock tests (new file).
src/PowerPlatform/Dataverse/operations/records.py	Adds `max_workers` kwarg to public record APIs + validation + docs.
src/PowerPlatform/Dataverse/operations/dataframe.py	Forwards `max_workers` into record operations + updates docs.
src/PowerPlatform/Dataverse/data/_odata.py	Implements chunking, concurrency, retry-with-jitter, ContextVar propagation, picklist cache locking.
src/PowerPlatform/Dataverse/claude_skill/dataverse-sdk-use/SKILL.md	Documents auto-chunking and `max_workers` behavior.
.claude/skills/dataverse-sdk-use/SKILL.md	Mirrors skill doc update for auto-chunking and concurrency.
examples/advanced/walkthrough.py	Adds walkthrough section demonstrating auto-chunking; renumbers later sections.
README.md	Documents `max_workers` and large-batch auto-chunking semantics.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

… validation - Fix .. note:: in records.create/update/upsert to mention concurrent dispatch when max_workers > 1, not just sequential chunking (comment #1) - Import _MAX_WORKERS in records.py and build the ValueError message from it so the text stays accurate if the cap changes (comment #2) - Add explicit ValueError in _dispatch_chunks for non-int or < 1 max_workers, with three new tests covering zero, negative, and non-int inputs (comments #4/#5) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

azure-pipelines · 2026-04-14T16:33:23Z

Azure Pipelines: Successfully started running 1 pipeline(s).

…h_chunking

azure-pipelines · 2026-04-14T16:49:37Z

Azure Pipelines: Successfully started running 1 pipeline(s).

sagebree · 2026-04-20T20:11:03Z

+        warnings.warn(
+            f"max_workers={max_workers} exceeds the maximum of {_MAX_WORKERS}; capping to {_MAX_WORKERS}.",
+            UserWarning,
+            stacklevel=2,


should it be stacklevel=4 for warning?\

Also either move the warning after the single-chunk check, or add 'no concurrency will be used' to the warning message when len(chunks) == 1

Updated this on the latest commit (5b2f6e9)

…evel=4 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

abelmilash-msft · 2026-04-22T00:48:05Z

Added page prefetching (#3 enhancement suggested on #156) on commit 9a1f490

Brings in SQL support (#141), display_name for table creation (#164), relationship/lookup APIs, QueryBuilder additions, and associated tests. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

sagebree · 2026-05-11T04:30:46Z

+            Lists exceeding 1,000 records are automatically split into chunks
+            of up to 1,000 records; dispatched sequentially by default, or
+            concurrently when ``max_workers > 1`` (capped to ``_MAX_WORKERS``).
+            This is **not atomic** — a failure mid-way may leave earlier chunks


sorry I missed this in the first pass.

With chunking, the user's atomic request is effectively converted into multiple non-atomic operations, which changes the original behavior. We should make this overload. that the default behavior is as-is.

I'm proposing

From:
ids = client.records.create("account", payloads, maxium_worker=1)

To:

ids = client.records.create("account", payloads) // default: non_atomic_chunking=False >> sends all
ids = client.records.create("account", payloads, non_atomic_chunking=True) >> sends in chunks, the parameter is clear that chunking is not atomic

Also hide number of workers, which is internal can be optimized by SDK client which out input from caller.

Abel Milash and others added 17 commits April 11, 2026 14:19

Auto-chunk *Multiple operations at 1,000 records (issue #156)

9d715be

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Replace B shorthand with _MULTIPLE_BATCH_SIZE in chunking tests

4d4a663

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Document atomicity trade-off and <=1000 guidance for chunked operations

e818533

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Remove chunking note from dataframe.py (chunking is an _odata detail)

6012ffa

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Update dataframe.py tips to reflect chunking atomicity trade-off

5874ab8

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Remove implementation details from dataframe.py tips

83c41af

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Improve chunking test coverage and walkthrough section order

8fa4ff2

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Add missing docstrings to test methods

a9b2b5c

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Restore method signatures and note formatting to match main

6cc5b6a

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Revert argparse change in walkthrough, restore input() prompt

10b9765

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Add concurrent chunk dispatch (max_workers) for bulk record operations

d488eef

Add max_workers concurrency support to bulk create/update/upsert

62c0ab6

Improve concurrency test coverage, fix docstrings, revert _MAX_WORKER…

c9bdc91

…S to 3 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Fix ContextVar propagation to worker threads, revert _MAX_WORKERS to …

2a393ea

…3, fix minor docstrings Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Remove contextvar demo script, tighten README wording

704b91c

Remove examples/advanced/contextvar_thread_demo.py (internal debugging artifact, not intended for the public repo). Trim redundant clause from the large-batch README tip. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Clarify max_workers cap wording in docstrings and comments

4bf4713

Replace 'silently capped' with 'capped to _MAX_WORKERS' throughout _odata.py to be consistent with the UserWarning now emitted on cap. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

abelmilash-msft requested a review from a team as a code owner April 13, 2026 20:05

Copilot AI review requested due to automatic review settings April 13, 2026 20:05

Apply black formatting to _odata.py and test_multiple_chunking.py

0ed2574

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Copilot AI reviewed Apr 13, 2026

View reviewed changes

Abel Milash and others added 3 commits April 13, 2026 16:16

Remove extra blank lines between test classes (black formatting)

9e03ea4

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

refactor: fix black formatting in test_multiple_chunking.py

46949c1

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Merge remote-tracking branch 'origin/main' into users/abelmilash/batc…

5ffbc89

…h_chunking

abelmilash-msft mentioned this pull request Apr 20, 2026

Add client-side batching for CreateMultiple/UpdateMultiple/UpsertMultiple #156

Open

sagebree reviewed Apr 20, 2026

View reviewed changes

Address PR comment: move UserWarning after single-chunk check, stackl…

5b2f6e9

…evel=4 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

abelmilash-msft force-pushed the users/abelmilash/batch_chunking branch 3 times, most recently from 5747ef0 to d867715 Compare April 22, 2026 00:38

Add prefetch_pages to _get_multiple for overlapped page fetching

9a1f490

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

abelmilash-msft force-pushed the users/abelmilash/batch_chunking branch from d867715 to 9a1f490 Compare April 22, 2026 00:43

Merge main into users/abelmilash/batch_chunking

e565c9f

Brings in SQL support (#141), display_name for table creation (#164), relationship/lookup APIs, QueryBuilder additions, and associated tests. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

sagebree reviewed May 11, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add auto-chunking and concurrent dispatch for bulk record operations#162

Add auto-chunking and concurrent dispatch for bulk record operations#162
abelmilash-msft wants to merge 25 commits into
mainfrom
users/abelmilash/batch_chunking

abelmilash-msft commented Apr 13, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

azure-pipelines Bot commented Apr 14, 2026

Uh oh!

azure-pipelines Bot commented Apr 14, 2026

Uh oh!

sagebree Apr 20, 2026

Uh oh!

abelmilash-msft Apr 21, 2026

Uh oh!

abelmilash-msft commented Apr 22, 2026 •

edited

Loading

Uh oh!

sagebree May 11, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

abelmilash-msft commented Apr 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Load test results

Concurrency comparison — 3,000 records (3 chunks of 1,000)

High-volume load — 25,000 records (25 chunks of 1,000, max_workers=15)

Atomicity note

Unit tests

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

azure-pipelines Bot commented Apr 14, 2026

Uh oh!

azure-pipelines Bot commented Apr 14, 2026

Uh oh!

sagebree Apr 20, 2026

Choose a reason for hiding this comment

Uh oh!

abelmilash-msft Apr 21, 2026

Choose a reason for hiding this comment

Uh oh!

abelmilash-msft commented Apr 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sagebree May 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

abelmilash-msft commented Apr 13, 2026 •

edited

Loading

abelmilash-msft commented Apr 22, 2026 •

edited

Loading

sagebree May 11, 2026 •

edited

Loading